Noisy Data Make the Partial Digest Problem NP - hardTECHNICAL
نویسندگان
چکیده
The Partial Digest problem { well-known for its applications in computational biology and for the intriguingly open status of its computational complexity { asks for the coordinates of n points on a line such that the pairwise distances of the points form a given multi-set of ? n 2 distances. In an eeort to model real-life data, we study the computational complexity of a minimization version of Partial Digest, in which only a subset of all pairwise distances is given and the rest are lacking due to experimental errors. We show that this variation is NP-hard to solve exactly, thus making the existence of polynomial-time algorithms for this problem extremely unlikely. Our result answers an open question posed by Pevzner (2000). We then study a maximiza-tion version of Partial Digest where a superset of all pairwise distances is given, with some additional distances due to inaccurate measurements. We show that this maximization version is NP-hard to approximate to within a factor of jDj 1 2 ? for any > 0, where jDj is the number of input distances, which implies that polynomial-time algorithms cannot even guarantee to nd a solution for the problem that comes close to the optimum. Our inapproximability result is tight up to low-order terms as we give a trivial approximation algorithm that achieves a matching approximation ratio. Our optimization variations model two diierent error types that occur in real-life data.
منابع مشابه
Noisy Data Make the Partial Digest Problem NP-hard
The problem to find the coordinates of n points on a line such that the pairwise distances of the points form a given multi-set of n 2 distances is known as Partial Digest problem, which occurs for instance in DNA physical mapping and de novo sequencing of proteins. Although Partial Digest was – as a combinatorial problem – already proposed in the 1930’s, its computational complexity is still u...
متن کاملModeling of Partial Digest Problem as a Network flows problem
Restriction Site Mapping is one of the interesting tasks in Computational Biology. A DNA strand can be thought of as a string on the letters A, T, C, and G. When a particular restriction enzyme is added to a DNA solution, the DNA is cut at particular restriction sites. The goal of the restriction site mapping is to determine the location of every site for a given enzyme. In partial digest metho...
متن کاملMeasurement Errors Make the Partial Digest Problem NP-Hard
The Partial Digest problem asks for the coordinates of m points on a line such that the pairwise distances of the points form a given
متن کاملDouble Digest Revisited: Complexity and Approximability in the Presence of Noisy Data
We revisit the double digest problem, which occurs in sequencing of large DNA strings and consists of reconstructing the relative positions of cut sites from two different enzymes: we rst show that double digest is strongly NP-complete, improving previous results that only showed weak NP-completeness. Even the (experimentally more meaningful) variation in which we disallow coincident cut sites ...
متن کاملA Continuous Optimization Model for Partial Digest Problem
The pupose of this paper is modeling of Partial Digest Problem (PDP) as a mathematical programming problem. In this paper we present a new viewpoint of PDP. We formulate the PDP as a continuous optimization problem and develope a method to solve this problem. Finally we constract a linear programming model for the problem with an additional constraint. This later model can be solved by the simp...
متن کامل